A Fast Parallel Tridiagonal Algorithm for a Class of CFD Applications
نویسندگان
چکیده
The parallel diagonal dominant (PDD) algorithm is an efficient tridiagonal solver. This paper presents for study a variation of the PDD algorithm, the reduced PDD algorithm. The new algorithm maintains the minimum communication provided by the PDD algorithm, but has a reduced operation count. The PDD algorithm also has a smaller operation count than the conventional sequential algorithm for many applications. Accuracy analysis is provided for the reduced PDD algorithm for symmetric Toeplitz tridiagonal (STT) systems. Implementation results on Langley's Intel Paragon and IBM SP2 show that both the PDD and reduced PDD algorithms are efficient and scalable. 1.0. Introduction Distributed-memory parallel computers dominate today's parallel computing arena. These machines, such as the Kendall Square KSR-1, Intel Paragon, TMC CM-5, and the recently announced IBM SP2 and Cray T3D concurrent systems, successfully deliver highperformance computing power for solving certain of the so-called "grand-challenge" problems (ref. 1). Despite initial success, parallel machines have not been widely accepted in the production engineering environment. On a parallel computing system, a task has to be partitioned and distributed appropriately among processors to reduce communication cost and to achieve load balance. More importantly, even with careful partitioning and mapping, the performance of an algorithm might still be unsatisfactory because conventional sequential algorithms may be serial in nature and may not be implemented efficiently on parallel machines. In many cases, new algorithms must be introduced to increase parallelism and to take advantage of the computing power of the scalable parallel hardware. Solving tridiagonal systems is a basic computational kernel of many computational fluid dynamics (CFD) applications. Tridiagonal systems appear in multigrid methods, alternating direction implicit (ADI) method, wavelet collocation method, and in-line successive over relaxation (SOR) preconditioners for conjugate gradient methods (ref. 2). In addition to solving partial differential equations (PDE), tridiagonal systems also arise in digital signal processing, image processing, stationary time series analysis, and spline curve fitting (ref. 3). One direct motivation for developing an efficient kernel for solving tridiagonal systems at the National Aeronautics and Space Administration (NASA) is that the implicit systems of compact schemes (ref. 4), which are relatively new finite-difference schemes widely used in production codes at Langley Research Center and Ames Research Center, are tridiagonal. Intensive research has been carried out on the development of efficient parallel tridiagonal solvers. Many algorithms have been proposed (refs. 5, 6, and 7), including the recursive doubling reduction method (RCD) developed by Stone (ref. 8) and the cyclic reduction or odd-even reduction method (OER) developed by Hockney (ref. 9). In general, parallel tridiagonal solvers require global communications, which makes them inefficient on distributed-memory architectures. Recently, we have taken a new approach: to increase parallel performance by introducing a bounded numerical error. Two new algorithms, namely the parallel diagonal dominant (PDD) algorithm (ref. 2) and the simple parallel prefix (SPP) algorithm (ref. 10), have been proposed for multiple-instruction multiple-data (MIMD) and singleinstruction multiple-data (SIMD) machines, respectively. These two algorithms take advantage of the fact that tridiagonal systems arising in compact schemes are diagonal dominant. Backed by rigorous accuracy analyses, the algorithms truncate communication and computation without degrading the accuracy of the calculations. In this paper, a new algorithm, the reduced PDD algorithm, is studied based on the same approach: increasing parallel performance by introducing a bounded numerical error. The reduced PDD algorithm, a variation of the PDD algorithm, maintains the minimum communication provided by the PDD algorithm, but has a reduced operation count. The reduced PPD algorithm also has a smaller operation count than the conventional sequential algorithm for many applications. The emphasis of this study is on implementation issues and performance comparisons of the PDD and reduced PDD algorithm. Most of the theoretical results, including the introduction of the PDD and reduced PDD algorithm, can be found in reference 2. This paper is organized as follows. Section 2 provides the background of the parallel PDD algorithm. Section 3 introduces the new algorithm, the reduced PDD algorithm. Section 4 gives an accuracy analysis for the reduced PDD algorithm. Experimental results on the Intel Paragon and IBM SP2 multicomputer are presented in section 5. Performance comparison of the newly proposed algorithmandotherexistingalgorithms, andof thetwoparallel platformsarealsodiscussed in thissection.Section 6provides concluding remarks. 2.0. Parallel Diagonal Dominant (PDD) Algorithm A tridiagonal system is a linear system of equations Ax = d (1) where x=(x I ..... xn) T and d=(d 1..... dn) T are n-dimensional vectors and A is a diagonally dominant tridiagonal matrix with order n:
منابع مشابه
GPGPU parallel algorithms for structured-grid CFD codes
A new high-performance general-purpose graphics processing unit (GPGPU) computational fluid dynamics (CFD) library is introduced for use with structured-grid CFD algorithms. A novel set of parallel tridiagonal matrix solvers, implemented in CUDA, is included for use with structured-grid CFD algorithms. The solver library supports both scalar and block-tridiagonal matrices suitable for approxima...
متن کاملApplication and Accuracy of the Parallel Diagonal Dominant Algorithm
The Parallel Diagonal Dominant (PDD) algorithm is an eecient tridiagonal solver. In this paper, a detailed study of the PDD algorithm is given. First the PDD algorithm is extended to solve periodic tridiagonal systems and its scalability is studied. Then the reduced PDD algorithm, which has a smaller operation count than that of the conventional sequential algorithm for many applications, is pr...
متن کاملMathematical Modeling and Analysis An Efficient, Numerically Stable, and Scalable Parallel Tridiagonal Solver
We describe a stable, efficient, parallel algorithm for the solution of diagonally dominant tridiagonal linear systems that scales well on distributed memory parallel computers. This algorithm is in the class of partitioning algorithms. Its multi-level recursive design makes it well suited for distributed memory parallel computers with very large numbers of processors. The need to solve large t...
متن کاملModular Coupling for Parallel Fluid-Structure Interaction Computations
A parallel CFD code capable of simulating flow within moving boundaries is coupled to a beam element structural dynamics code. The coupled codes are used to simulate fluid-structure interaction for a class of applications involving long and slender structures, e.g., suspension bridges and offshore risers. Due to the difference in size and dimensionality of the 3D CFD problem on one side, and th...
متن کاملA VLSI Fast Solver for Tridiagonal Linear Systems
In this paper, the area-time complexity of a VLSI solver for tridiagonal linear systems is studied. Both a lower and an upper bound are derived which meet to within the exponent of the logarithmic factor. The proposed VLSI design derives from the parallel version of the well-known odd-even reduction algorithm [7] for tridiagonal linear systems, which requires O(log n) parallel steps. Then, the ...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 1996